710 research outputs found

    SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors

    Full text link
    The maximal sensitivity of the Smith-Waterman (SW) algorithm has enabled its wide use in biological sequence database search. Unfortunately, the high sensitivity comes at the expense of quadratic time complexity, which makes the algorithm computationally demanding for big databases. In this paper, we present SWAPHI, the first parallelized algorithm employing Xeon Phi coprocessors to accelerate SW protein database search. SWAPHI is designed based on the scale-and-vectorize approach, i.e. it boosts alignment speed by effectively utilizing both the coarse-grained parallelism from the many co-processing cores (scale) and the fine-grained parallelism from the 512-bit wide single instruction, multiple data (SIMD) vectors within each core (vectorize). By searching against the large UniProtKB/TrEMBL protein database, SWAPHI achieves a performance of up to 58.8 billion cell updates per second (GCUPS) on one coprocessor and up to 228.4 GCUPS on four coprocessors. Furthermore, it demonstrates good parallel scalability on varying number of coprocessors, and is also superior to both SWIPE on 16 high-end CPU cores and BLAST+ on 8 cores when using four coprocessors, with the maximum speedup of 1.52 and 1.86, respectively. SWAPHI is written in C++ language (with a set of SIMD intrinsics), and is freely available at http://swaphi.sourceforge.net.Comment: A short version of this paper has been accepted by the IEEE ASAP 2014 conferenc

    Stability and sensitivity analysis of stochastic programs with second order dominance constraints

    Get PDF
    In this paper we present stability and sensitivity analysis of a stochastic optimizationproblem with stochastic second order dominance constraints. We consider perturbation of theunderlying probability measure in the space of regular measures equipped with pseudometricdiscrepancy distance ( [30]). By exploiting a result on error bound in semi-infinite programmingdue to Gugat [13], we show under the Slater constraint qualification that the optimal valuefunction is Lipschitz continuous and the optimal solution set mapping is upper semicontinuouswith respect to the perturbation of the probability measure. In particular, we consider the case when the probability measure is approximated by empirical probability measure and show the exponential rate of convergence of optimal solution obtained from solving the approximation problem. The analysis is extended to the stationary points when the objective function is nonconvex

    Numerical Methods for Distributed Stochastic Compositional Optimization Problems with Aggregative Structure

    Full text link
    The paper studies the distributed stochastic compositional optimization problems over networks, where all the agents' inner-level function is the sum of each agent's private expectation function. Focusing on the aggregative structure of the inner-level function, we employ the hybrid variance reduction method to obtain the information on each agent's private expectation function, and apply the dynamic consensus mechanism to track the information on each agent's inner-level function. Then by combining with the standard distributed stochastic gradient descent method, we propose a distributed aggregative stochastic compositional gradient descent method. When the objective function is smooth, the proposed method achieves the optimal convergence rate O(K1/2)\mathcal{O}\left(K^{-1/2}\right). We further combine the proposed method with the communication compression and propose the communication compressed variant distributed aggregative stochastic compositional gradient descent method. The compressed variant of the proposed method maintains the optimal convergence rate O(K1/2)\mathcal{O}\left(K^{-1/2}\right). Simulated experiments on decentralized reinforcement learning verify the effectiveness of the proposed methods

    Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++

    Get PDF
    [Abstract]: The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 aligner show that our implementation based on dynamic scheduling obtains good scalability on multi-core clusters. Through our evaluation, we are able to complete the single-end and paired-end alignments of 246 million reads of length 150 base-pairs in 11.54 and 16.64 minutes, respectively, using 32 nodes with four AMD Opteron 6272 16-core CPUs per node. In contrast, the multi-threaded original tool needs 2.77 and 5.54 hours to perform the same alignments on the 64 cores of one node. The source code of our parallel implementation is publicly available at the CUSHAW3 homepage (http://cushaw3.sourceforge.net).[Resumen]: El crecimiento de los conjuntos de datos de "secuenciamiento de próxima generación" (NGS por sus siglas en inglés) es un reto respecto a la calidad y a la velocidad de alineamientos de secuencias a genomas de referencia. Algunos alineadores disponibles obtienen mapeados de alta calidad a expensas de largos tiempos de ejecución. Desarrollar software rápido y preciso es muy importante para la investigación, ya que la disponibilidad y tamaño de los conjuntos NGS continua creciendo. En este trabajo presentamos una paralelización eficiente para el alineamiento de secuencias cortas de NGS en sistemas con nodos de múltiples núcleos de computación. Nuestra aproximación se aprovecha de un modelo de programación distribuida-compartida basado en el nuevo lenguaje UPC++. Los resultados experimentales usando el alineador CUSHAW3 muestran que nuestra implementación basada en reparto dinámico de trabajo obtiene buena escalabilidad. En nuestra evaluación somos capaces de completar alineamientos sencillos y en parejas de 246 millones de secuencias de longitud 150 en 11.54 y 16.64 minutos, respectivamente, usando 32 nodos con cuatro AMD Opteron 6272 y 16 núcleos de CPU cada uno. Sin embargo, la herramienta multi-hilo original necesita 2.77 y 5.54 horas para completar los mismos alineamientos en los 64 núcleos de un nodo. El código fuente de nuestra implementación paralela está disponible públicamente en la web de CUSHAW3 (http://cushaw3.sourceforge.net).[Resumo]: O medre dos conxuntos de datos de "secuenzamento de próxima xeración" (NGS polas súas siglas en inglés) é un reto respecto á calidade e á velocidade dos aliñamentos de secuencias a xenomas de referencia. Algúns aliñadores disponibles obteñen mapeados de alta calidade a expensas de largos tempos de execución. Desenvolver software rápido e preciso é moi importante para a investigación, xa que a disponibilidade e tamaño dos conxuntos NGS continua a medrar. Neste traballo presentamos unha paralelización eficiente para o aliñamiento de secuencias cortas de NGS en sistemas con nodos de múltiples núcleos de computación. A nosa aproximación aproveitase dun modelo de programación distribuida-compartida basado na nova linguaxe UPC++. Os resultados experimentais que fan uso do aliñador CUSHAW3 mostran que a nosa implementación baseada en reparto dinámico de traballo obtén boa escalabilidade. Na nosa avaliación somos capaces de completar aliñamentos sinxelos e en parellas de 246 millóns de secuencias de lonxitude 150 en 11.54 e 16.64 minutos, respectivamente, empregando 32 nodos con catro AMD Opteron 6272 e 16 núcleos de CPU cada un. Sen embargo, a ferramenta multi-fío oxiginal necesita 2.77 e 5.54 horas para completar os mesmos aliñamientos nos 64 núcleos dun nodo. O código fonte da nosa implementación paralela está disponible públicamente na web de CUSHAW3 (http://cushaw3.sourceforge.net)

    Sub-daily simulation of mountain flood processes based on the modified soil water assessment tool (SWAT) model

    Get PDF
    Floods not only provide a large amount of water resources, but they also cause serious disasters. Although there have been numerous hydrological studies on flood processes, most of these investigations were based on rainfall-type floods in plain areas. Few studies have examined high temporal resolution snowmelt floods in high-altitude mountainous areas. The Soil Water Assessment Tool (SWAT) model is a typical semi-distributed, hydrological model widely used in runoff and water quality simulations. The degree-day factor method used in SWAT utilizes only the average daily temperature as the criterion of snow melting and ignores the influence of accumulated temperature. Therefore, the influence of accumulated temperature on snowmelt was added by increasing the discriminating conditions of rain and snow, making that more suitable for the simulation of snowmelt processes in high-altitude mountainous areas. On the basis of the daily scale, the simulation of the flood process was modeled on an hourly scale. This research compared the results before and after the modification and revealed that the peak error decreased by 77% and the time error was reduced from +/- 11 h to +/- 1 h. This study provides an important reference for flood simulation and forecasting in mountainous areas

    Parallelized short read assembly of large genomes using de Bruijn graphs

    Get PDF
    BACKGROUND: Next-generation sequencing technologies have given rise to the explosive increase in DNA sequencing throughput, and have promoted the recent development of de novo short read assemblers. However, existing assemblers require high execution times and a large amount of compute resources to assemble large genomes from quantities of short reads. RESULTS: We present PASHA, a parallelized short read assembler using de Bruijn graphs, which takes advantage of hybrid computing architectures consisting of both shared-memory multi-core CPUs and distributed-memory compute clusters to gain efficiency and scalability. Evaluation using three small-scale real paired-end datasets shows that PASHA is able to produce more contiguous high-quality assemblies in shorter time compared to three leading assemblers: Velvet, ABySS and SOAPdenovo. PASHA's scalability for large genome datasets is demonstrated with human genome assembly. Compared to ABySS, PASHA achieves competitive assembly quality with faster execution speed on the same compute resources, yielding an NG50 contig size of 503 with the longest correct contig size of 18,252, and an NG50 scaffold size of 2,294. Moreover, the human assembly is completed in about 21 hours with only modest compute resources. CONCLUSIONS: Developing parallel assemblers for large genomes has been garnering significant research efforts due to the explosive size growth of high-throughput short read datasets. By employing hybrid parallelism consisting of multi-threading on multi-core CPUs and message passing on compute clusters, PASHA is able to assemble the human genome with high quality and in reasonable time using modest compute resources
    corecore